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ABSTRACT 

Applied researchers frequently precede analyses of 
interest with one or more preliminary tests, used to help researchers 
determine which variables to examine more closely, or whether there 
are anomalies in the data set. These texts can be classified into 
three categories: omnibus tests, tests for model fit, and exploratory 
tests. Fifty-four applied journal articles, 31 statistical textbooks, 
and the statistical literature are reviewed to discuss some 
limitations associated with uses of some preliminary tests. The focus 
is on the following topics: (1) the omnibus F-test in analysis of 
variance; (2) tests for variance equality; (3) tests for equality of 
regression slopes in analysis of covariance; and (A) tests for 
sphericity in repeated measures designs. In general, it is concluded 
that many preliminary statistical tests are not useful. In many 
contexts, omnibus tests do not answer questions of substantive 
interest. Preliminary analyses in tests for model fit are unnecessary 
because alternative less restrictive models can be used, and because 
many tests for violations of data assumptions lack adequate 
statistical power, or are overly sensitive to another assumption 
violation. More focused analyses are advocated, using less 
restrictive analytical models, and an increased use of exploratory 
analyses is recommended. One table illustrates the discussion. Lists 
of the journal articles and textbooks reviewed are appended. 
(Contains 43 references.) (SLD) 
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ABSTRACT 

Applied researchers frequently precede analyses of interest 
with one or more preliminary tests. These tests can be 
classified into three categories: 1) omnibus tests; 2) tests for 
model fit; and 3) exploratory tests. The present paper reviews a 
sample of applied journal articles, statistical textbooks, and 
the statistical literature to discuss some limitations associated 
with uses of some preliminary tests. In general it is concluded 
that many preliminary tests are not useful. We advocate more 
focused analyses using less restrictive statistical models, and 
recommend an increased use of exploratory analyses. 
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Preliminary Statistical Tests 

Researchers interested in answering substantive questions 
with specific analyses often precede their analyses with one or 
more preliminary statistical tests. These preliminary tests can 
be classified into one of three categories: a) omnibus tests; b) 
tests for model fit; and c) exploratory analyses. 

An omnibus test, that is the simultaneous test of several 
hypotheses in a single analysis, is frequently examined before 
individual hypotheses are tested, A purpose of this test often 
given is to limit the risk of making a Type I error across 
multiple follow-up tests- Application of this preliminary test 
is very popular and examples can be found throughout the applied 
research literature; this test dominates statistical methods 
textbooks. The simultaneous comparison of several population 
means through the analysis of variance F-test is a good example 
of an omnibus test that is often conducted before specific 
hypotheses are tested through contrast analyses. Other examples 
include the test of the full model in multiple regression 
analysis before specific coefficients are tested, and a 
multivariate test for the simultaneous equality of several 
outcome measures before a series of univariate tests is 
conducted . 

Each statistical test is based on a mathematical model that 
has been formulated assuming a specific data structure. 
Preliminary statistical tests can be completed in an effort to 



4 



4 

determine whether there is sufficient evidence to support the 
conclusion that the observed data do not fit the assumed model. 
For example, the t-test for comparing two independent population 
means is based on the assumptions that the outcome variable 
measures are independent and the population distributions are 
normal with equal variance. A violation of any of these 
assumptions can invalidate the probability statements made in 
drawing conclusions from the analysis (Glass, Peckham, & Sanders, 
1972). Heterogeneous population variances can, for example, 
result in a risk of a Type I error greater than the nominal 
level. The Hartley F-max and the Bartlett chi-squared tests are 
well known procedures that might be used to determine whether 
populations have equal variances. Other examples of model fit 
include a preliminary test on higher order interaction effects to 
justify a main effects model, and a test for linearity before 
accepting a lineai regression model. 

As an exploratory analytic technique, preliminary tests are 
used to help researchers determine which variables to examine 
more closely or to determine whether there are anomalies in the 
data set. Examples might include the use of variable selection 
procedures to identify subsets of variables to include in a 
model, or the use of factor or component analysis to reduce the 
number of predictor variables to be considered in a regression 
model. The examination of data sets for outliers by using the 
Cook distance statistic is still another example where a 
preliminary analysis is conducted before specific hypotheses are 
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tested . 

The routine application of preliminary tests is often 
recommended by instructors and textbook writers. The purpose of 
the present paper is to discuss the limitations associated with 
uses of somo preliminary tests. Our discussion is based on a 
review of a convenience sample of statistical methods textbooks, 
a review of some published educational and psychological research 
articles, and a review of the statistical literature found by 
searching through the Current Index to Statistics (CIS) from 1986 
to the present. In the statistical literature,-, preliminary tests 
are occasionally referred to as "tests under conditional 
specifications . 11 

The review af journal articles was limited to either 
articles published in the Journal of Experimental Education 
Volumes 58 (1990-91) and 59 (1991-92), or empirical research 
published since 1979 on the effectiveness of study strategies 
with post secondary students. The Journal of Experimental 
Education was chosen because we felt that articles published in 
this journal are fairly representative of competent quantitative 
research inquiries conducted by behavioral science researchers. 
Studies on the effectiveness of study strategies were chosen 
because of a research interest of the first author and because 
investigations in this area can be found in a wide variety of 
behavioral science journals. Bibliographic information on 54 
articles reviewed is given in Appendix A. 

Twenty-five introductory and six intermediate statistical 
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methods textbooks were selected for review - - all but six of the 
31 books were published in 1990 or later. We chose these books 
because we are either currently using them , have used them 
recently , or have considered them as a primary text for the 
statistical methods classes that we teach. A complete listing of 
the books is presented in Appendix B. 

While there are many preliminary tests that can be and are 
conducted by researchers, the focus of the present paper is on 
four tests: the omnibus F-test in analysis of variance, test for 
variance equality, test for equality of regression slopes in 
analysis of covariance, and tests for sphericity in repeated 
measures designs. These tests were chosen because we felt that 
they are well known by applied researchers, because they are 
frequently included in statistical methods textbooks, and because 
these tests are often considered in conjunction with the research 
designs frequently used by applied researchers in the behavioral 
sciences . 

Omnibus Tests 

ANOVA F-test 

Part of the rationale for the development of the analysis of 
variance (ANOVA) F-test was to allow researchers to compare 
several population means simultaneously. Multiple two-group t- 
tests for pairwise comparisons have been discouraged by some 
(e.g., Stevens, 1990, p. 32) because the overall probability of 
at least one Type I error can be quite large depending on the 
number of tests. [The probability of at least one Type I error 



among k tests is no more than l-(l-a) k , where a is the 
probability of committing a Type I error with each of the k 
tests.] The most important limitation of the omnibus F-test is 
that it is so general that it typically does not address an 
interesting substantive question. The rejection of the null 
hypothesis simply means that there is sufficient evidence to 
conclude that the populations from which the samples were 
selected do not have identical means. This is an answer to a 
question that is rarely, if ever, of interest to the applied 
researcher. Answers to substantitive questions of typical 
interest to applied researchers require specific contrast 
analyses. In our review of 54 published articles, 31 of the 
studies involved an explanatory variable with more than two 
levels, and the analysis for each of these 31 studies began with 
an omnibus F-test. Several of these articles had explicitly 
stated research questions that would be answered appropriately 
with a specific pairwise or complex contrast. For example, in 
one study the researchers wrote 11 the first question asked whether 
students in mapping treatments, A and B, would score 
significantly higher on holistic scores.... than in the 
nonmapping, group C treatment". The researchers incorrectly 
based the answer to this question on the omnibus F-test. 

Concern for an inflated Type I error rate may be over- 
emphasized by instructors and textbook authors. This concern is 
not shared by all statisticians (e.g., Saville, 1990). The 
overall risk of a Type I error can be controlled by using one of 
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the many Bonf erroni-type adjustments (Li, Olejnik, & Huberty, 
1992). In our review of the 31 journal articles, none of the 
researchers stopped their analyses following the rejection of the 
ANOVA F~test. Each researcher continued by either discussing the 
specific group means or by further hypothesis testing with a 
contrast procedure. The most popular contrast tests were the 
Scheffe test and the Newman-Keuls test. We did not find a single 
reference to a Bonf erroni-type adjustment. 

Textbooks generally emphasize the use of the ANOVA F-test 
followed by one of several contrast analyses. Many discussions 
of contrast procedures referred to post hoc techniques. Fourteen 
of the introductory texts took this approach. Authors often 
mislead readers to believe that contrast procedures can only be 
used after the omnibus hypothesis test has been rejected. While 
some procedures do require the omnibus test, most do not. We did 
not find a single incident where the procedure developed by Tukey 
took precedence ofer testing the omnibus hypothesis. 

Among the six intermediate statistical methods textbooks 
reviewed, only the Maxwell and Delaney (1990) text suggests that 
contrasts can be tested instead of conducting the omnibus test. 
The five remaining texts do state that if planned contrasts are 
examined the omnibus test is unnecessary, but they also indicate 
that if post hoc procedures are of interest the omnibus test must 
be conducted first. 

We recognize, however, that there are situations where an 
omnibus F-test can be useful. One such situation involves the 
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test for an interaction in a factorial design. A test for the 
interaction may precede contrast analyses by guiding the data 
analyst to construct the contrasts using cell rather than 
marginal means. It might be noted that Tukey (1991) has 
suggested contrasts that involve cell effects after removing main 
and interaction effects may be of interest; thus precluding the 
test for an interaction. 

Another context where the preliminary omnibus F-test may be 
useful is when all pairwise contrasts are of interest to the 
researcher. Hayter (1986) demonstrated that the omnibus F-test 
can be used in conjunction with a contrast analysis procedure to 
enhance the statistical power. Alternatively, Shaffer (1986) 
recommends the omnibus F-test as a preliminary test to a 
sequential multiple range contrast procedure. Seaman, Levin, and 
Serlin (1991) studied these approaches and concluded that they 
both can be useful. If all pairwise contrasts are not of 
interest, neither the Hayter nor the Shaffer procedures would be 
used . 

Other Omnibus Tests 

While some writers (e.g., Maxwell & Delaney, 1990, p. 200; 
Toothaker, 1991, p. 55; Tukey, 1992) have encouraged researchers 
to ignore the overall test of equal means if it does not pertain 
to a substantive question of interest such a recommendation is 
virtually unheard of when it comes to testing the equality of 
proportions. Three situations in which an omnibus test might be 
bypassed are: (1) two groups, polytomous outcome; (2) multiple 
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groups, dichotomous outcome; and (3) multiple groups, polytomous 
outcome. In each of these situations, questions more specific 
than omnibus questions would typically be of greater substantive 
interest. Notationwiso , let p Xj denote the true proportion of 
experimental units in Group j who were expected to respond 
according to Category i with respect to the categorical outcome 
variable. For situation (1), a specific null hypothesis might be 
H 0 : p 31 - p 32 = 0; for situation (2), H 0 : p 1JL - p 14 = 0? and for 
situation (3), H 0 : 2p 32 - p 33 - P^O. The omnibus test for any of 
these three situations typically pertains to the independence of 
the grouping variable and the (categorical) outcome variable, 
[The statistic used is the so-called "Pearson chi-square(d) " 
statistic] Rejection of the null hypothesis is not seen as 
being too substantively informative in most situations; tests of 
more specific hypotheses would, in many cases, be more 
informative . 

Another omnibus test that yields very limited substantive 
information is that for a multivariate analysis of variance 
(MANOVA) conducted prior to multiple ANOVAs . It is often 
implicitly or explicitly argued that a MANOVA rejection gives the 
researcher a "license" to proceed to the use of multiple ANOVAs. 
This rationale has been rebuked by Huberty and Morris (1989). 

There is another preliminary null hypothesis in the multiple 
response variable arena that has been advocated by some writers. 
This test involves a true correlation matrix, R. [In the SPSS 
MANOVA procedure, this test is called "Bartlett's test of 
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sphericity."] Testing H a : R = I (the identity matrix) makes 
sense to us in one context, but not in another. The sensible 
context is that of factor analyzing the sample correlation 
matrix, R. As McDonald (1984, p. 24) points out, !, It is the 
obvious test to use as a general protection against foolish 
optimism when hunting for relations in a mass of data. 11 
Considering H 0 : R = I as an hypothesis prior to examining 
individual bivariate correlations for statistical significance 
(e.g., Steiger, 1980) is not judged to be defensible. To us, 
this is analogous to employing MANOVA prior to multiple ANOVAs. 

Finally, an omnibus preliminary test that is suggested by 
some methodologists (e.g. Cliff, 1987, p. 431) is to conduct a 
canonical correlation analysis (CCA) and, if "significant 
results' are obtained, conduct multiple multiple 
correlation/regression analyses. If the discovery cf canonical 
variates is not of substantive interest, then conducting a CCA is 
judged irrelevant . 

Te st for Model Fit 

Variance Equality 

Statistical methodologists have studied extensively the 
effect of variance heterogeneity on the validity of the ANOVA F- 
test and the two-group t-test for means. The results of these 
studies consistently show that the violation of the equal 
variance assumption can result in an increased risk of a Type I 
error when population variances are negatively related to sample 
sizes (Glass, Peckham, & Sanders, 1972; Milligan, Wong, & 
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Thompson, 1987; Tomarken & Serlin, 1986). Even when there is no 
relationship between sample size and variance, an increased risk 
of a Type I error can occur if there are substantial differences 
in the variances (Wilcox, Charlin, & Thompson, 1986). Because of 
these results, researchers might be expected to examine the 
sample variances to try to determine whether there is evidence to 
indicate that the assumption has been violated. Several 
statistical tests have been developed to compare variances, 
including the Hartley F-max test, the Cochran test, and the 
Bartlett test. Since the violation of the assumption can 
threaten the statistical validity of the test of means, it might 
seem to be an important consideration to be addressed by textbook 
authors of introductory and intermediate texts. In our review, 
only the textbook by Popham and Sirotnik (1992) recommends the 
preliminary test in the two group case and only Heiman (1992) 
recommends the test in the multiple group situation. Most of the 
texts mention the assumption but generally ^gnore the problem 
with respect to testing hypotheses on the means. Among the 

intermediate textbooks only Stevens (1990) and Keppel (1991) 

suggest formal testing for variance equality. 

In our review of the journal articles only two articles 

commented on the apparent inequality of the population variances 

and neither used a statistical procedure in a formal test of the 

assumption. 

Although variance inequality is a serious threat to the 
statistical validity of tests for mean equality, we do not 
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believe that a test for the violation of the assumption is 
warranted or advised. Several tests have been developed to test 
the equality of population variances but most of these tests are 
sensitive to non-normality (Conover, Johnson, & Johnson, 1981). 
Tests for variance homogeneity that are robust to non-normality 
(Brown & Forsythe, 1974; O'Brien, 1978; Ramsey & Brailsford, 
1990) are not sufficiently sensitive to a violation of the 
assumption (Olejnik, 1987; Wilcox et al 1986), More importantly, 
alternatives to the traditional ANOVA F-test and two-group t-test 
are available. Specifically, the Welch solution to the Behrens- 
Fisher problem is available on the SAS T-Test procedure and the 
James second-order test is available for analysis of variance 
(Oshima & Algina, 1992), Moser and Stevens (1992) also recommend 
that the Welch alternative in the two group case when variances 
are unknown. Although these tests are approximate tests, they do 
limit the risk of a Type I error below the nominal level. In 
terms of statistical power they are only slightly less powerful 
than the independent samples t-test or the ANOVA F-test. The 
textbook by Moore and McCabe (19b9) introduces the Welch 
procedure in discussing the test for two independent means; but 
fails to continue with this position in the multiple-group 
situation. Contrast procedures are also available which do not 
require equal variances (Dunnett, 1980; Games & Howell, 1976; 
Hsuing & Olejnik, 1991). 
Test for Equal Slopes 

A fairly common statistical procedure in the behavioral 



sciences is analysis of covariance. The primary purpose of this 
technique is to reduce error variance in an experimental study 
and to attempt to equate comparison groups in a non-experimental 
study (Porter & Roundenbush, 1987). Both purposes are achieved 
by considering the relationship between the covariate and the 
outcome variable, which is assumed to be the same for all 
populations being compared. When the relationship between 
outcome variable and the covariate differs as a function of 
levels of the grouping variable, there is an interaction between 
the covariate and the grouping variable. If there is an 
interaction, then the interpretation of the main effect for the 
grouping variable can be ambiguous. Consequently, as with the 
factorial ANOVA, the test of the interaction generally precedes 
the test of the grouping variable main effect. 

Three of the introductory statistical methods texts that we 
reviewed commented on this test and only two of them (Glass & 
Hopkins, 1984; Hays, 1988) provide sufficient information to 
compute the test statistic. All six of the intermediate texts we 
examined recommend that the preliminary test be conducted. 

Nine of the journal articles we reviewed used analysis of 
covariance. Only two of the articles commented on examining the 
within-groups regression slopes, and neither of these studies 
rejected the equal slopes hypothesis. 

We believe that the test for the equality of regression 
slopes is useful but cannot be relied upon except for situations 
where the violation of the assumption is extreme. Rogosa (1980) 
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pointed out that the preliminary test for slope equality had both 
statistical and logical limitations. If the sample size is large 
trivial differences in slopes will lead to the rejection of the 
model while small sample sizes can result in accepting an 
inappropriate model. Furthermore, small differences in the 
slopes may not invalidate the conclusion regarding group 
differences on the response measure. 

The issue of statistical power can be evaluated by 
estimating the necessary sample size needed to detect the 
interaction. Using the Cohen (1988) power analysis tables, the 
sample sizes needed to test the equality of two regression slopes 
were determined for three levels of statistical power. Table 1 
summarizes our results using the Cohen definition for a small, 
medium, and large difference between two population standardized 
regression coefficients. Examples of a pair of standardized 
regression coefficients reflecting a small difference is .60, .66 
or .10, -.10; a pair of coefficients of .60, .76 or .10, -.20 
reflects a medium difference; and a large difference is reflected 
by pairs of coefficients equalling .60, .83 or .10, -.40. 



Insert Table 1 here 



Because a Type II error would be more serious in this case, the 
statistical power should be set no less than .9 and the Type I 
error rate may be relaxed to equal .20. From Table 1 a sample 
size of at least 56 units from each population would be needed to 
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detect a large difference between two regression slopes. Smaller 
differences in slopes would require many more units. 

In our review of the nine studies using analysis of 
covariance only one study had a sample size meeting this 
requirement. Typically, the sample size per treatment condition 
approximately equalled 30 units. This is not very surprising. 
If a researcher had planned the research study to test the 
hypothesis that two population means were equal and set the Type 
I error rate to equal to .05 and the Type II error rate to equal 
.20, then using the Cohen definition of a medium effect size as 
the criterion of practical significance and assuming a 
correlation of .70 between the covariate and the outcome, the 
researcher would find (using the Cohen power tables) that a 
sample size of 32 was sufficient to test the hypothesis the two 
populations have equal means on an outcome of interest. However 
with this sample size, the power to test the equality of slopes 
would equal .73 for a large difference in slopes and less than 
.50 for medium and small differences in slopes when the test of 
equal slopes has a Type I error rate of .20. 

These examples demonstrate a serious problem with the 
preliminary test for the equality of regression slopes in 
analysis of covariance. That is, the sample ize necessary to 
test the equality of means is considerably less than the sample 
size needed to test the equality of the within group regression 
slopes . 

We believe that researchers should not rely on the 
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preliminary test for guidance on the correct statistical model. 
As an alternative, the data should be analyzed allowing the 
slopes to differ as Rogosa (1980) suggested (also see Maxwell & 
Delaney, 1990, pp. 406-420). Specific hypotheses can then be 
tested through simultaneous contrast analyses fcr all relevant 
levels of the covariate. If the slopes are in fact equal and 
this analysis is conducted, some statistical power will be lost. 
Chou (1991) showed, however, that the reduction in statistical 
power was small when the slopes were equal. 
Test for Sphericity 

Educational research studies often involve the repeated 
measurement of experimental units. In our review of the journal 
articles we found 19 studies that used a repeated measures 
design. Seven of these studies included at least three measures 
on each subject and 12 studies included only two measures. The 
statistical model for the design, with more than two measures 
assumes that the variance of the difference scores between 
measures are equal. This assumption is known as the sphericity 
assumption. When the assumption is violated, the univariate 
hypothesis test will have a Type I error rate that exceeds the 
nominal significance level. If the assumption is violated, a 
multivariate test, which does not assume sphericity, can be used. 
When the sphericity assumption is met, the univariate test is 
more powerful than the multivariate test. Consequently, a test 
for sphericity might seem as appropriate to determine whether a 
univariate or a multivariate test should be conducted. Several 
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alternatives have been suggested. 

In our review of the textbooks none of the introductory 
textbooks discussed the repeated measures design. All six of the 
intermediate texts discuss repeated measures designs and only the 
Lomax (1992) text did not discuss the sphericity issue at all. 
Winer et al (1991) and Kirk (1982) discuss a test for sphericity 
but do not recommend it, and the remaining texts do not recommend 
the tests but suggest using the univariate approach with an 
adjusted degrees-of-f reedom value for the computed test statistic 
to create a conservative test. 

Seven of the 19 studies in our sample of journal articles 
involved a within subjects factor having three of more levels. 
None of these studies commented on the sphericity assumption and 
none of them used an adjusted degrees of freedom test or a 
multivariate test . 

The test for sphericity has received considerable attention 
in the statistical literature. Robey and Barcikowski (1987) 
discuss and review five tests for sphericity, and recommend that 
researchers forego any of the tests since they are all sensitive 
to nonnormality . Looney and Stanley (1989) discourage the test 
for sphericity in favor of using both the univariate test and the 
multivariate test each using a reduced statistical criterion 
(a/2). On the other hand, Cornell, Young, Seaman, and Kirk 
(1992) have recently examined the statistical power for eight 
tests for sphericity and concluded that these tests are sensitive 
to violations of the sphericity assumption when population 
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distributions are normal. They recommend a preliminary test for 
normality in addition to the preliminary test for sphericity. 

We agree with the textbook authors who discourage the use of 
preliminary tests for sphericity. Our position is based on two 
considerations. First, the tests for sphericity are sensitive to 
the assumption of multivariate normality. Micceri (1990) has 
provided convincing evidence to indicate that many variables 
studied by behavioral researchers are not normally distributed. 
There is ample reason therefore to doubt the validity of these 
tests for sphericity with data from the behavioral sciences. 

Our second reason for rejecting the sphericity test is our 
belief that omnibus tests are generally not needed and that 
contrast analyses are more appropriate. The sphericity 
assumption is necessary only for the omnibus test involving the 
repeated measures factor. As we pointed out earlier the omnibus 
test is too general to be of interest to most serious 
researchers. Consequently, multiple related samples t-tests as 
suggested by Maxwell (1980) seem more appropriate to us (also see 
Toothaker, 1991, p. 134). Control for an inflated Type I error 
rate can be provided through a Bonf erroni-type adjustment 
(Holland, 1991? Keselman, Keselman, & Shaffer, 1991). 
Other Tests for Model Fit 

The recommendation that measures on response variables be 
examined for fit with theoretical normal distributions is 
sometimes suggested by textbook authors. Although there are 
formal statistical tests for normality, the suggestion often 
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given is the use of an "eyeball test 11 that would be made via a 
quantile-quantile plot (Moore & McCabe, 1989, p. 65) or a 
residual plot (Neter, Wa^serman , & Whitmore, 1988, p. 734). 
Residual plots may also be employed in assessing linearity in 
correlation/regression analyses. In the special bivariate 
situation, scatter-plots should be routinely generated prior to 
the calculation and interpretation of a Pearson correlation 
coefficient. 

Finally, there is the problematic test of the equality of 
population correlation matrices, a test often considered prior to 
a MANOVA. This may also be considered a "test for linearity" in 
the context of predictive discriminant analysis. This test is 
problematic because of its extensive statistical power for 
samples of respectable sizes and because of its reliance on the 
troublesome condition of multivariate normality. 

Exploratory Tests 
A third type of preliminary test might be classified as an 
exploratory test. Such a test arises in two situations. First, 
when researchers do not have a strong theoretical model to drive 
their data collection and analysis; and second, when the 
population being studied is not well understood or clearly 
defined. When new inquiries are made into some phenomenon that 
does not have a theoretical base, the constructs or the 
indicators of the constructs under investigation are often not 
well understood. Consequently, researchers sometimes take a 
"shotgun" approach to data collection. Rather than a focused 



inquiry using a limited number of construct indicators, multiple 
indicators are often used and preliminary tests are conducted to 
better understand the interrelationships among the indicators and 
possibly to reduce the number of indicators used in the primary 
analysis to answer the research question. So, for example, a 
preliminary analysis using principal components or factor 
analysis might be used to reduce the number of indicators used in 
a primary analysis. Also, measures for multicolinearity might be 
examined prior to a multiple regression analysis to reduce the 
redundancy among the explanatory indicators. 

When the population being studied is not clearly defined or 
understood, preliminary analyses might be carried out to gain 
some insight into the characteristics of the subjects studied. 
Cluster analysis might be carried out to group the units into 
more homogeneous subgroups. The examination of outliers using 
the Cook (1977) distance statistic can be used to identify 
experimental units which do not belong with the others in the 
data set (e.g. Bollen & Jackman, 1990). 

Such preliminary tests can at times be very useful to a data 
analyst, and researchers should be encouraged to use them. In 
our review of the journal articles we did not come across a 
single example where exploratory tests were conducted. We were 
somewhat surprised and disappointed that none of the authors 
commented on any effort to identify outliers. We can only assume 
that such analyses were not conducted. The textbooks we examined 
generally do not encourage researchers to conduct exploratory 
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analyses. The intermediate textbooks do comment on tests for 
outliers but we feel that these tests should be given greater and 
earlier emphasis. 

Conclusion 

In general, it is our position that many preliminary 
statistical tests are not necessary. As we discussed above, in 
many contexts omnibus tests do not answer questions of 
substantive interest to the applied researcher. In terms of 
tests for model fit, preliminary analyses are unnecessary because 
alternative less restrictive models can be used, and because many 
tests for violations of data assumptions either lack adequate 
statistical power or are overly sensitive to another assumption 
violation. Consequently, these preliminary tests are often 
uninf ormative at best and can seriously mislead the data analyst. 
As an exploratory technique, however, preliminary analyses can be 
extremely useful; these analyses typically do not rely on 
statistical criteria . 

In general we advocate that researchers think about their 
research problem to identify the specific questions to answer and 
to test only those hypotheses of interest. Most of the studies 
that we reviewed had specific questions in mind when they planned 
their research; unfortunately the researchers felt compelled 
(probably because of tradition and training) to use an analysis 
strategy that did not address the question at hand. 

We think that researchers should become more knowledgeable 
about the population they are studying and about the constructs 
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under investigation. Based on this knowledge, researchers should 
identify the appropriate test statistic that is valid for the 
situation . 

We also believe that alternative analysis techniques which 
minimize model assumptions should be encouraged and used. 
Appropriate less restrictive models are available and the only 
serious consequence of using these models when more restrictive 
models can be used may be a loss of statistical power. 
Additional research in this area is needed but there is some 
evidence to indicate that in some contexts the loss in 
statistical power is not great. 

Finally, exploratory analyses which contribute to a greater 
understanding of the constructs and populations under 
investigation should be encouraged and expected as a routine 
segment of the data analysis process. Textbook authors should 
emphasize and demonstrate these analyses to a greater extent and 
researchers should comment on these analyses in their articles 
even if the results do not change the planned analysis strategy. 
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Table 1 



Estimated sample sizes needed to detect a small, medium, and 
large difference in two regression slopes for three levels o f 
power and two levels of Type I error rates 



Probability 
Effect of Type I 
Size Error 



. 7 



Power 



.8 



. 9 



Small 



. 10 
. 20 



944 
655 



1240 
905 



1716 
1317 



Medium 



.10 
. 20 



108 
75 



140 
103 



193 
149 



Large .10 
. 20 



42 
29 



52 
39 



72 
56 
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